Articulatory-feature-based confidence measures
نویسندگان
چکیده
Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recognition, the resulting scores may not provide an independent measure of reliability. In this paper, we propose two articulatory feature (AF) based phoneme confidence measures that estimate the acoustic reliability based on the match in AF properties. While acoustic-based features, such as Mel-frequency cepstral coefficients (MFCC), are widely used in speech processing, some recent works have focus on linguistically based features, such as the articulatory features that relate directly to the human articulatory process which may better capture speech characteristics. The articulatory features can either replace or complement the acoustic-based features in speech processing. The proposed AF-based measures in this paper were evaluated, in comparison and in combination, with the HMM-based scores on phoneme and keyword verification tasks using children s speech collected for a computer-based English pronunciation learning project. To fully evaluate their usefulness, the proposed measures and combinations were evaluated on both native and non-native data; and under field test conditions that mis-matches with the training condition. The experimental results show that under the different environments, combinations of the AF scores with the HMM-based scores outperforms HMM-based scores alone on phoneme and keyword verification. 2005 Elsevier Ltd. All rights reserved.
منابع مشابه
Audiovisual-to-articulatory inversion
It has been shown that acoustic-to-articulatory inversion, i.e. estimation of the articulatory configuration from the corresponding acoustic signal, can be greatly improved by adding visual features extracted from the speaker’s face. In order to make the inversion method usable in a realistic application, these features should be possible to obtain from a monocular frontal face video, where the...
متن کاملAdaptive articulatory feature-based conditional pronunciation modeling for speaker verification
Because of the differences in education background, accents, and so on, different persons have different ways of pronunciation. Therefore, the pronunciation patterns of individuals can be used as features for discriminating speakers. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. T...
متن کاملArticulatory feature-based conditional pronunciation modeling for speaker verification
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed technique aims to establish a link between articulatory properties (e.g., manners and places of a...
متن کاملModeling pronunciation variation with context-dependent articulatory feature decision trees
We consider the problem of predicting the surface pronunciations of a word in conversational speech, using a model of pronunciation variation based on articulatory features. We build context-dependent decision trees for both phone-based and feature-based models, and compare their perplexities on conversational data from the Switchboard Transcription Project. We find that a fully-factored model,...
متن کاملArticulatory feature asynchrony analysis and compensation in detection-based ASR
This paper investigates the effects of two types of imperfection, namely detection errors and articulatory feature asynchrony, of the front-end articulatory feature detector on the performance of a detection-based ASR system. Based on a set of variable-controlled experiments, we find that articulatory feature asynchrony is the major issue that should be addressed in detection-based ASR. To this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Speech & Language
دوره 20 شماره
صفحات -
تاریخ انتشار 2006